33 research outputs found

    Topology-aware GPU scheduling for learning workloads in cloud environments

    Get PDF
    Recent advances in hardware, such as systems with multiple GPUs and their availability in the cloud, are enabling deep learning in various domains including health care, autonomous vehicles, and Internet of Things. Multi-GPU systems exhibit complex connectivity among GPUs and between GPUs and CPUs. Workload schedulers must consider hardware topology and workload communication requirements in order to allocate CPU and GPU resources for optimal execution time and improved utilization in shared cloud environments. This paper presents a new topology-aware workload placement strategy to schedule deep learning jobs on multi-GPU systems. The placement strategy is evaluated with a prototype on a Power8 machine with Tesla P100 cards, showing speedups of up to ≈1.30x compared to state-of-the-art strategies; the proposed algorithm achieves this result by allocating GPUs that satisfy workload requirements while preventing interference. Additionally, a large-scale simulation shows that the proposed strategy provides higher resource utilization and performance in cloud systems.This project is supported by the IBM/BSC Technology Center for Supercomputing collaboration agreement. It has also received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 639595). It is also partially supported by the Ministry of Economy of Spain under contract TIN2015-65316-P and Generalitat de Catalunya under contract 2014SGR1051, by the ICREA Academia program, and by the BSC-CNS Severo Ochoa program (SEV-2015-0493). We thank our IBM Research colleagues Alaa Youssef and Asser Tantawi for the valuable discussions. We also thank SC17 committee member Blair Bethwaite of Monash University for his constructive feedback on the earlier drafts of this paper.Peer ReviewedPostprint (published version

    Performance Evaluation of Microservices Architectures using Containers

    Get PDF
    Microservices architecture has started a new trend for application development for a number of reasons: (1) to reduce complexity by using tiny services; (2) to scale, remove and deploy parts of the system easily; (3) to improve flexibility to use different frameworks and tools; (4) to increase the overall scalability; and (5) to improve the resilience of the system. Containers have empowered the usage of microservices architectures by being lightweight, providing fast start-up times, and having a low overhead. Containers can be used to develop applications based on monolithic architectures where the whole system runs inside a single container or inside a microservices architecture where one or few processes run inside the containers. Two models can be used to implement a microservices architecture using containers: master-slave, or nested-container. The goal of this work is to compare the performance of CPU and network running benchmarks in the two aforementioned models of microservices architecture hence provide a benchmark analysis guidance for system designers.Comment: Submitted to the 14th IEEE International Symposium on Network Computing and Applications (IEEE NCA15). Partially funded by European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No 639595) - HiEST Projec

    Managing SLAs of heterogeneous workloads using dynamic application placement

    Get PDF
    In this paper we address the problem of managing heterogeneous workloads in a virtualized data center. We consider two different workloads: transactional applications and long-running jobs. We present a technique that permits collocation of these workload types on the same physical hardware. Our technique dynamically modifies workload placement by leveraging control mechanisms such as suspension and migration, and strives to optimally trade off resource allocation among these workloads in spite of their differing characteristics and performance objectives. Our approach builds upon our previous work on dynamically placing transactional workloads. This paper extends our framework with the capability to manage long-running workloads. We achieve this goal by using utility functions, which permit us to compare the performance of various workloads, and which are used to drive allocation decisions. We demonstrate that our technique maximizes heterogeneous workload performance while providing service differentiation based on high-level performance goals.Postprint (published version

    Utility-based placement of dynamic Web applications with fairness goals

    Full text link
    Abstract—We study the problem of dynamic resource allo-cation to clustered Web applications. We extend application server middleware with the ability to automatically decide the size of application clusters and their placement on physical machines. Unlike existing solutions, which focus on maximizing resource utilization and may unfairly treat some applications, the approach introduced in this paper considers the satisfaction of each application with a particular resource allocation and attempts to at least equally satisfy all applications. We model satisfaction using utility functions, mapping CPU resource al-location to the performance of an application relative to its objective. The demonstrated online placement technique aims at equalizing the utility value across all applications while also satisfying operational constraints, preventing the over-allocation of memory, and minimizing the number of placement changes. We have implemented our technique in a leading commercial middleware product. Using this real-life testbed and a simulation we demonstrate the benefit of the utility-driven technique as compared to other state-of-the-art techniques. I

    Enabling distributed key-value stores with low latency-impact snapshot support

    Get PDF
    Current distributed key-value stores generally provide greater scalability at the expense of weaker consistency and isolation. However, additional isolation support is becoming increasingly important in the environments in which these stores are deployed, where different kinds of applications with different needs are executed, from transactional workloads to data analytics. While fully-fledged ACID support may not be feasible, it is still possible to take advantage of the design of these data stores, which often include the notion of multiversion concurrency control, to enable them with additional features at a much lower performance cost and maintaining its scalability and availability. In this paper we explore the effects that additional consistency guarantees and isolation capabilities may have on a state of the art key-value store: Apache Cassandra. We propose and implement a new multiversioned isolation level that provides stronger guarantees without compromising Cassandra's scalability and availability. As shown in our experiments, our version of Cassandra allows Snapshot Isolation-like transactions, preserving the overall performance and scalability of the system.This work is partially supported by the Ministry of Science and Technology of Spain and the European Union’s FEDER funds (TIN2007-60625, TIN2012-34557), by the Generalitat de Catalunya (2009-SGR-980), by the BSC-CNS Severo Ochoa program (SEV-2011-00067), by the HiPEAC European Network of Excellence (IST- 004408, FP7-ICT-217068, FP7-ICT-287759), and by IBM through the 2008 and 2010 IBM Faculty Award program.Peer ReviewedPostprint (author’s final draft

    Some Performance Aspects of Trading Service Implementations

    No full text
    This papers aims at presenting some performance aspects of Trader Service Implementation. The Trader Service activity is analysed to identify operations which are crucial for performance. Then, two Trader models: OMG Trader and ANSA Trader are analysed and their two implementations: OrbixTrader and ANSA Trader ported to Orbix2.1 arecompared. The tests appliedtomeasure the performanceare described and their results presented along with the identi#cation of those implementation aspects which have the greatest impact on the performance. 1 Introduction Applications in the distributed systems are increasing in complexity and functionality. A large number of clients and servers may be present in such a system, and they need to locate and use functionality based in other processes in the system. The Trading Service allows clients to dynamically locate servers basing on descriptions of requirements, rather than looking for speci#c instances of servers. The Trading Service has been recognize..

    Server virtualization in autonomic management of heterogeneous workloads

    No full text
    corecore